Suppose you wanted to measure the effectiveness of Metababoost™, a new weight-loss drink that has become very popular recently.
You conduct a large tracking survey where you are able to re-interview the same people at various points over 12 months.
You are interested in how BMI changes amongst people who regularly consume Metababoost™, compared against those who do not.
After one year, you find that people who regularly consume Metababoost™ had a larger drop in BMI than people in the comparison group. The difference is 1.5 kg/m2, on average.
| Illustration of potential outcomes for the change in BMI, depending on whether or not an individual consumes Metababoost™ | |||
| BMI Change if No Metababoost™ | BMI Change if Metababoost™ | Difference | |
|---|---|---|---|
| Alex | 2 | 0 | -2 |
| Bonnie | 1 | 0 | -1 |
| Colin | 0 | 0 | 0 |
| Danielle | 0 | -3 | -3 |
| Earl | -3 | -6 | -3 |
| Fiona | -4 | -4 | 0 |
| Gaston | -6 | -7 | -1 |
| Hermine | -8 | -10 | -2 |
| AVERAGE | -2.25 | -3.75 | -1.5 |
To say that Metababoost™ causes a BMI drop of 1.5 kg/m2, we mean that in an imaginary counterfactual world where the people who actually drank Metababoost™ instead did not drink it, their BMI would be 1.5 kg/m2 higher, on average.
Similarly, we could say that in a counterfactual world where the people who actually didn’t drink Metababoost™ had instead consumed it regularly, their BMI would be 1.5 kg/m2 lower, on average.
This is the idea behind causation within the potential outcomes framework.
Small group exercise: Think about a causal claim that you would be interested in evaluating (this doesn’t have to be related to ethnic diversity). How would you state this causal claim in the counterfactual outcomes framework?
If we could observe everyone’s potential outcome (as in the above table), then finding evidence of causation is easy!
Of course, we cannot observe these counterfactual worlds. In the real world, people either took Metababoost™ or they didn’t. In other words, our real-world data look something like this:
| Illustration of observed change in BMI for people who do and don’t drink Metababoost™ | |||
| No Metababoost™ | Metababoost™ | Difference | |
|---|---|---|---|
| Alex | 2 | ? | |
| Bonnie | 0 | ? | |
| Colin | 0 | ? | |
| Danielle | 0 | ? | |
| Earl | -6 | ? | |
| Fiona | -4 | ? | |
| Gaston | -7 | ? | |
| Hermine | -8 | ? | |
Since we do not observe counterfatual outcomes, how can we estimate a causal effect?
As it turns out, we cannot estimate a separate treatment effect for each individual (why not?).
But can estimate the average treatment effect across all individuals…
Imagine you had a box with a large number of tickets inside. On each ticket is written a value from 0 to 50. You task is estimate the average value of the tickets in the box. You randomly choose 100 tickets from the box, and the average on these tickets is 35.
What is your best estimate for average value of the tickets in the whole box?
Returning to our working example, imagine you had a population of 1000 people. You randomly assign 500 of them to drink Metababoost™ for a year (and you make sure they actually do it). Let’s call these people the treatment group (T), and let’s call the change in BMI you measure for these people their treatment outcomes.
The other 500 people constitute the control group (C) and you make sure that they do not consume any Metababoost™ during the year. Their change in BMI constitute the control outcomes.
Just as you can use the value on your 100 randomly-drawn to estimate the
value of all of the tickets in the box, you can think of the
observed treatment outcomes as a random sample of
all potential treatment outcomes. Thus, the
average of these observed treatment outcomes forms your estimate of the
average of all potential treatment outcomes.
Similarly, the average of your observed control outcomes forms your estimate of the average of all potential control outcomes.
| Illustration of observed change in BMI for people randomly assigned to drink Metababoost™ | |||
| No Metababoost™ | Metababoost™ | Difference | |
|---|---|---|---|
| Subject1 | 2 | ? | |
| Subject2 | -5 | ? | |
| Subject3 | 0 | ? | |
| Subject4 | -3 | ? | |
| … | |||
| Subject999 | -2 | ? | |
| Subject1000 | -1 | ? | |
| AVERAGE | -2.25 | -3.75 | -1.5 |
In the Table above, even though we cannot observe all of the potential outcomes, we can nonetheless estimate their averages.
Taking the difference between these two estimates yields your average treatment effect (ATE), or the average causal effect of Metababoost™.
NOTE: this only works because you have randomly allocated people into T and C.
Recall that the fundamental problem of causal inference arises because people may self-select into T or C.
To return to our working example, the people who choose to drink Metababoost™ may be different in terms of their potential outcomes from the people who choose not to drink it. For instance, suppose that people (e.g. Hermine) who cared a lot about diet and exercise also bought Metababoost™, while those (e.g. Colin) who don’t care so much about fitness ignored the whole Metababoost™ fad.
For example, allowing people to self-select into treatment might yield the following:
| Illustration of observed change in BMI depending on whether people choose to drink Metababoost™ | |||
| BMI Change if No Metababoost™ | BMI Change if Metababoost™ | Difference | |
|---|---|---|---|
| Alex | 2 | ? | |
| Bonnie | 1 | ? | |
| Colin | 0 | ? | |
| Danielle | 0 | ? | |
| Earl | -6 | ? | |
| Fiona | -4 | ? | |
| Gaston | -7 | ? | |
| Hermine | -10 | ? | |
| AVERAGE | 0.75 | -6.75 | -7.5 |
Comparing the above table to the full schedule of potential outcomes, we can see that people in the treatment group would have lost a lot of weight anyways, even if they didn’t drink Metababoost™, while people in the control group would not have lost very much weight, even if they did buy Metababoost™.
But since we only observe treatment outcomes for fitness freaks and control outcomes for couch potatoes, we overestimate the average effect of Metababoost™.
More broadly, if we allow people to self-select in T and C, we can no longer consider the observed treatment/control outcomes as a random sample of all potential treatment/control outcomes. Thus, our basis for assessing causality falls apart.
In the absence of random assignment, estimates of the ATE may be biased – that is, if we reran this experiment a large of times, our estimates would tend to be either too large or too small.
Here is another way of thinking about this problem: suppose there is a third variable – motivation to get fit – which is correlated with both:
Here we can say that motivation confounds the statistical relationship between drinking Metababoost™ and change in BMI. However, since motivation is not measured, it constitutes a source of omitted variable bias.
Randomization solves this problem by making sure that, on average, motivation (plus all other possible confounding variables) is equalized between T and C.
Now let’s return to the task of estimating the causal effect of
(contextual) diversity. Think about Enos’ experiment.
Suppose your friend sees Enos’ results and says:
“Nice experiment, but he’s just documenting a temporary reaction to the unexpected appearance of Latinos in all-white suburbs. Over time, however, people are going to become more comfortable with diversity. For example, cities have historically been magnets for immigration, and the people living there seem to have no problem with diversity.”